Skip to main content

Dataverse SDK For Python

Project description

Dataverse SDK For Python

Dataverse is a MLOPs platform for assisting in data selection, data visualization and model training in computer vision. Use Dataverse-SDK for Python to help you to interact with the Dataverse platform by Python. Currently, the library supports:

  • Create Project with your input ontology and sensors
  • Get Project by project-id
  • Create Dataset from your AWS/Azure storage or local
  • Get Dataset by dataset-id
  • List models for your selected project-id
  • Get and download your model

Package (PyPi) | Source code

Getting started

Install the package

pip install dataverse-sdk

Prerequisites: You must have an Dataverse Platform Account and Python 3.10+ to use this package.

Create the client

Interaction with the Dataverse site starts with an instance of the DataverseClient class. You need site url, an email-account and its password to instantiate the client object.

from dataverse_sdk import *
from dataverse_sdk.connections import get_connection
from dataverse_sdk.constants import DataverseHost

client = DataverseClient(
    host=DataverseHost.PRODUCTION.value, email="XXX", password="***", service_id="xxxx-xxxx-xx-xxx", alias="default", force = False
)
assert client is get_connection("default")

# Should provide different alias if you are trying to connect to different workspaces
client2 = DataverseClient(
    host=DataverseHost.PRODUCTION.value, email="account-2", password="***", service_id="xxxx-xxxx-xx-xxx", alias="client2", force = False
)
assert client2 is get_connection(client2.alias)

client3 = DataverseClient(
    host=DataverseHost.PRODUCTION.value, email="XXX", password="", service_id="xxxx-xxxx-xx-xxx", access_token="xxx"
)
assert client3 is get_connection(client3.alias)
  • Input arguments:
Argument name Type/Options Default Description
host str *-- the host url of the dataverse site (with curation port)
email str *-- the email account of your dataverse workspace
password str *-- the password of your dataverse workspace
service_id str *-- The service id of the dataverse you want to connect
alias str 'default' the connection alias of your dataverse client
force bool False whether force to replace the connection if the given alias exists
access_token str None instead of password to do authentication

Key concepts

Once you've initialized a DataverseClient, you can interact with Dataverse from the initialized object.

Examples

The following sections provide examples for the most common DataVerse tasks including:

Get User

The get_user method is to list the current user info. You can get the detail info, such as role, permission and user detail.

user = client.get_user()

List Projects

The list_projects method will list all projects of the given sites.

  • Example Usage:
projects = client.list_projects(current_user = True,
                                exclude_sensor_type=SensorType.LIDAR,
                                image_type= OntologyImageType._2D_BOUNDING_BOX)
  • Input arguments:
Argument name Type/Options Default Description
current_user bool True only show the projects of current user
exclude_sensor_type SensorType.CAMERA
SensorType.LIDAR
None exclude the projects with the given sensor type
image_type OntologyImageType._2D_BOUNDING_BOX
OntologyImageType.SEMANTIC_SEGMENTATION
OntologyImageType.CLASSIFICATION
OntologyImageType.POINT
OntologyImageType.POLYGON
OntologyImageType.POLYLINE
None only include the projects with the given image type

Create Project

The create_project method will create project on the connected site with the defined ontology and sensors.

  • Example Usage:
# 1) Create ontology with ontologyclass object
ontology = Ontology(
    name="sample ontology",
    image_type=OntologyImageType._2D_BOUNDING_BOX,
    pcd_type = None,
    classes=[
        OntologyClass(name="Pedestrian", rank=1, color="#234567"),
        OntologyClass(name="Truck", rank=2, color="#345678"),
        OntologyClass(name="Car", rank=3, color="#456789"),
        OntologyClass(name="Cyclist", rank=4, color="#567890"),
        OntologyClass(name="DontCare", rank=5, color="#6789AB"),
        OntologyClass(name="Misc", rank=6, color="#789AB1"),
        OntologyClass(name="Van", rank=7, color="#89AB12"),
        OntologyClass(name="Tram", rank=8, color="#9AB123"),
        OntologyClass(name="Person_sitting", rank=9, color="#AB1234"),
    ],
)

For project with camera sensor, there would be only one image_type for one project. You could choose from [OntologyImageType._2D_BOUNDING_BOX, OntologyImageType.SEMANTIC_SEGMENTATION, OntologyImageType.CLASSIFICATION, OntologyImageType.POINT, OntologyImageType.POLYGON, OntologyImageType.POLYLINE].

For project with lidar sensor, your should assign pcd_type = OntologyPcdType.CUBOID for the ontology.

# 2) Create your sensor list with name / SensorType
sensors = [
    Sensor(name="camera1", type=SensorType.CAMERA),
    Sensor(name="lidar1", type=SensorType.LIDAR),
]

# 3) Create your project tag attributes (Optional)
project_tag = ProjectTag(
    attributes=[
        {"name": "year", "type": "number"},
        {
            "name": "unknown_object",
            "type": "option",
            "options": [{"value": "fire"}, {"value": "leaves"}, {"value": "water"}],
        },
    ]
)

# 4) Create your project with your ontology/sensors/project_tag
project = client.create_project(name="Sample project", ontology=ontology, sensors=sensors, project_tag=project_tag)
  • Input arguments for creating project:
Argument name Type/Options Default Description
name str *-- name of your project
ontology Ontology *-- the Ontology basemodel data of current project
sensors list[Sensor] *-- the list of Sensor basemodel data of your project
project_tag ProjectTag None your project tags
description str None your project description

*--: required argument without default


Get Project

The get_proejct method retrieves the project from the connected site. The project_id parameter is the unique integer ID of the project, not its "name" property.

project = client.get_project(project_id= 1, client_alias=client.alias) # if client_alias is not provided, we'll get it from client

Edit Project

For editing project contents, we have four functions below for add/edit project tag and ontology classes.

Add New Project Tags

  • Note: Can not create existing project tag!
tag = {
        "attributes": [
            {
                "name": "month",
                "type": "number"
            },
            {
                "name": "weather",
                "type": "option",
                "options": [{"value":"sunny"}, {"value":"rainy"}, {"value":"cloudy"}
                ]
            }]}
project_tag= ProjectTag(**tag)
#should provided client_alias if calling from client
client.add_project_tag(project_id = 10, project_tag=project_tag, client_alias=client.alias)
#OR
project.add_project_tag(project_tag=project_tag)

Edit Project Tags

** Note:

  1. Can not edit project tag that does not exist
  2. Can not modify the data type of existing project tags
  3. Can not provide attributes with existing options
tag = {
        "attributes": [
            {
                "name": "weather",
                "type": "option",
                "options": [{"value":"unknown"}, {"value":"snowy"}
                ]
            }]}
project_tag= ProjectTag(**tag)
#should provided client_alias if calling from client
client.edit_project_tag(project_id = 10, project_tag=project_tag, client_alias=client.alias)
#OR
project.edit_project_tag(project_tag=project_tag)

Add New Ontology Classes

  • Note: Can not add existing ontology class!
new_classes = [OntologyClass(name="obstruction",
                    rank=9,
                    color="#AB4321",
                    attributes=[{
                    "name":
                    "status",
                    "type":
                    "option",
                    "options": [{
                    "value": "static"}, {"value": "moving"
                    }]}])]
#should provided client_alias if calling from client
client.add_ontology_classes(project_id=24, ontology_classes=new_classes, client_alias=client.alias)
#OR
project.add_ontology_classes(ontology_classes=new_classes)

Edit Ontology Classes

** Note:

  1. Can not edit ontology class that does not exist
  2. Can not modify the data type of existing ontology class attributes
  3. Can not provide attributes with existing options
edit_classes = [OntologyClass(name="obstruction",
                    color="#AB4321",
                    attributes=[{
                    "name":
                    "status",
                    "type":
                    "option",
                    "options": [{
                    "value": "unknown"}]}])]
#should provided client_alias if calling from client
client.edit_ontology_classes(project_id=24, ontology_classes=edit_classes, client_alias=client.alias)
#OR
project.edit_ontology_classes(ontology_classes=edit_classes)

Update Ontology Alias

  1. Get the csv file of alias map for your project
client.generate_alias_map(project_id=123, alias_file_path="./alias.csv")
  1. Fill the alias in the csv file and save (DO NOT modify other fields)

  2. Update alias for your project with the alias file path

client.update_alias(project_id=123, alias_file_path= "/Users/Downloads/alias.csv" )

Create Dataset

Use create_dataset to import dataset from cloud storage

dataset_data = {
    "name": "Dataset 1",
    "data_source": DataSource.Azure/DataSource.AWS,
    "storage_url": "storage/url",
    "container_name": "azure container name",
    "data_folder": "datafolder/to/vai_anno",
    "type": DatasetType.ANNOTATED_DATA,
    "annotation_format": AnnotationFormat.VISION_AI,
    "annotations": ["groundtruth"],
    "sequential": False,
    "render_pcd": False,
    "generate_metadata": False,
    "sas_token": "azure sas token",  # only for azure storage
    "access_key_id" : "aws s3 access key id",# only for private s3 bucket, don't need to assign it in case of public s3 bucket or azure data source
    "secret_access_key": "aws s3 secret access key"# only for private s3 bucket, don't need to assign it in case of public s3 bucket or azure data source
}
dataset = project.create_dataset(**dataset_data)
  • Input arguments for creating dataset from cloud storage:
Argument name Type/Options Default Description
name str *-- name of your dataset
data_source DataSource.Azure
DataSource.AWS
*-- the datasource of your dataset
storage_url str *-- your cloud storage url
container_name str None azure container name
data_folder str *-- the relative data folder from the storage_url and container
sensors list[Sensor] *-- the list of Sensor of your dataset (one or more from project specified sensors)
type DatasetType.ANNOTATED_DATA
DatasetType.RAW_DATA
*-- your dataset type (annotated or raw data)
annotation_format AnnotationFormat.VISION_AI
AnnotationFormat.KITTI
AnnotationFormat.COCO
AnnotationFormat.YOLO
AnnotationFormat.IMAGE
*-- the format of your annotation data
annotations list[str] None list of names for your annotation data folders, such as ["groundtruth"]
sequential bool False data is sequential or not
render_pcd bool False render pcd preview image or not
generate_metadata bool False generate image meta data or not
description str None your dataset description
sas_token str None SAStoken for azure container
access_key_id str None access key id for AWS private s3 bucket
secret_access_key str None secret access key for AWS private s3 bucket

*--: required argument without default


Use create_dataset to import dataset from LOCAL

dataset_data2 = {
    "name": "dataset-local-upload",
    "data_source": DataSource.LOCAL,
    "storage_url": "",
    "container_name": "",
    "data_folder": "/YOUR/TARGET/LOCAL/FOLDER",
    "sensors": project.sensors,
    "type": DatasetType.ANNOTATED_DATA, # or DatasetType.RAW_DATA for images
    "annotation_format": AnnotationFormat.VISION_AI,
    "annotations": ["groundtruth"],  # remove it when type is DatasetType.RAW_DATA
    "sequential": False,
    "generate_metadata": False,
    "sas_token": ""
}
dataset2 = project.create_dataset(**dataset_data2)

Your could also use the script for importing dataset from local

python tools/import_dataset_from_local.py -host https://staging.visionai.linkervision.ai/dataverse/curation -e {your-account-email} -p {PASSWORD} -s {service-id}  -project {project-id} --folder {/YOUR/TARGET/LOCAL/FOLDER} -name {dataset-name} -type {raw_data OR annotated_data} -anno {image OR vision_ai} --sequential

List and Get Dataset

The list_datasets method would return the list of dataset under the given project

project = client.get_project(project_id=1)
datasets:list = project.list_datasets()

OR

datasets:list = client.list_datasets(project_id=1, client_alias=client.alias )

The get_dataset method retrieves the dataset info from the connected site. The dataset_id parameter is the unique integer ID of the dataset, not its "name" property.

dataset = client.get_dataset(dataset_id=5)

List and Get Dataslices

# list dataslices with project_id
client.list_dataslices(project_id=101, client_alias=client.alias)

# Get target dataslice data
dataslice_data = client.get_dataslice(dataslice_id=504)

Export Dataslice and Download

# Trigger export and get export record id
export_record = client.export_dataslice(dataslice_id=504)
# Use export record id to download export data
client.download_export_dataslice_data(dataslice_id=504, export_record_id=export_record["export_record_id"])

List Models

The list_models method will list all the models in the given project. You can filter models by type using the type parameter.

Basic Usage

# Method 1: Using client
models = client.list_models(project_id=1, client_alias=client.alias)

# Method 2: Using project object
project = client.get_project(project_id=1)
models = project.list_models()

Filtering by Model Type

You can filter models by type using strings or lists of strings. The SDK supports multiple model types:

# Filter by single type using string
models = client.list_models(project_id=1, type="trained", client_alias=client.alias)

# Filter by single type using list
models = client.list_models(project_id=1, type=["trained"], client_alias=client.alias)

# Filter by multiple types using list
models = client.list_models(
    project_id=1,
    type=["trained", "byom", "uploaded"],
    client_alias=client.alias
)

Available Model Types

String Value Description
"trained" Trained models
"byom" Bring Your Own Model
"uploaded" Uploaded models

Input Arguments

Argument name Type/Options Default Description
project_id int *-- The project ID
client_alias str None The client alias
type "trained", "byom", "uploaded", list["trained", "byom", "uploaded] ["trained", "byom"] Model types to filter by

*--: required argument without default


Get Model

The get_model method will get the model detail info by the given model-id

model = client.get_model(model_id=30, client_alias=client.alias)
model = project.get_model(model_id=30)

From the given model, we could get the model convert records as below

model_record = client.get_convert_record(convert_record_id=1, client_alias=client.alias)
OR
model_record = model.get_convert_record(convert_record_id=1)

  • If the converted model format is onnx, you could download the model as below.
# Get the target convert record, and download labels.txt and model.onnx
model_record = model.get_convert_record(convert_record_id=5)
status, label_file_path = model_record.get_label_file(save_path="./labels.txt", timeout=6000)
status, onnx_model_path = model_record.get_onnx_model_file(save_path="./model.onnx", timeout=6000)

Create VQA Project

The create_vqa_project method will create project on the connected site with the defined questions/answer_type.

  • Example Usage:
# 1) Create question class with question and answer type pair
question_answer = [ QuestionClass(class_name="question1", rank=1, question="Is any person found in the picture?",
                    answer_type="boolean"),
                    QuestionClass(class_name="question2", rank=2, question="What is the blob color of traffic light?",   answer_type="option",answer_options=["red","yello","green"])
                   ]
# 2) Create your VQA project as below
project = client.create_vqa_project(name="vqa-project", sensor_name="camera1", ontology_name="vqa-ontology" question_answer=question_answer)
  • Input arguments for creating project:
Argument name Type/Options Default Description
name str *-- name of your project
sensor_name str *-- the camera sensor name
ontology_name str *-- the ontology name
question_answer list[QuestionClass] *-- your question/answer_type
description str None your project description

*--: required argument without default


Edit VQA Ontology

** Note:

  1. Can not edit question answer type
  2. Can not update with existing answer options
  3. Can not add question with existing rank id
create_questions = [QuestionClass(class_name="question3", rank=3, question="Age?",answer_type="number")]
update_questions = [{"rank": 2, "question": "What is the blob color of traffic light?(the closet one)", "options":["black"] }]

#should provide client_alias if calling from client
client.edit_vqa_ontology(project_id=24,  ontology_name="ontology-new-name",
                                         create=create_questions,
                                         update=update_questions,
                                         client_alias=client.alias)
#OR
project.edit_vqa_ontology(project_id=24, ontology_name="ontology-new-name",
                                         create=create_questions,
                                         update=update_questions)

Get Question List

The function below could help you get the question list of VQA project (which could help you to prepare the annotated data)

output = client.get_question_list(project_id=107, output_file_path="./question.json" )

Quick Tools

Import Your Local Dataset

python tools/import_dataset_from_local.py -host https://staging.visionai.linkervision.ai/dataverse/curation -e {your-account-email} -p {PASSWORD} -s {service-id}  -project {project-id} --folder {/YOUR/TARGET/LOCAL/FOLDER} -name {dataset-name} -type {raw_data OR annotated_data} -anno {image OR vision_ai} --sequential

Import VQA Local Dataset

python tools/import_vqa_dataset.py -host https://staging.visionai.linkervision.ai/dataverse/curation -e {your-account-email} -p {PASSWORD} -s {service-id} -project {project-id} --folder {/YOUR/TARGET/LOCAL/FOLDER} -type {raw_data OR annotated_data}

Export Dataslice and download files

python tools/export_dataslice.py -host https://staging.visionai.linkervision.ai/dataverse/curation  -e {your-account-email} -p {PASSWORD} -s {service-id} -dataslice {dataslice_id} -f {/YOUR/TARGET/LOCAL/file.zip}

Export Large Dataslice and download files

python tools/export_dataslice_large.py -host https://visionai.linkervision.ai/dataverse/curation -e {your-account-email} -p {PASSWORD} -s {service-id} -dataslice {dataslice_id} --anno {export-model-name / groundtruth} --target_folder {folder path} --export-format {coco, visionai, yolo, vlm ...etc}

Upload videos to create session tasks

python tools/upload_videos_create_session.py -host https://visionai.linkervision.ai/dataverse/curation -e {your-account-email} -p {PASSWORD} -s {service-id} -f {/YOUR/VIDEOS/LOCAL/FOLDER} -n {session-name}
  • Advanced arguments for video curation (sequential data):
Argument name Type/Options Default Description
--video-curation bool False enable video curation (sequential data)
--global-mean-threshold float 0.001 Threshold for the video's global average motion magnitude (0.000001 ~ 0.01). Higher values are stricter (flag more clips as low-motion); lower values are looser (flag fewer clips).
--per-patch-256-min-threshold float 0.000001 Minimum average motion magnitude allowed in any 256x256 pixel patch (0.000001 ~ 0.0001). Higher values are stricter per-patch (flag more clips when any 256x256 patch is too still); lower values are looser (flag fewer clips).
--split-duration int 5 Set the length of each split clip in seconds (2 ~ 30s).

Links to language repos

Python Readme

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataverse_sdk-2.5.1.tar.gz (52.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataverse_sdk-2.5.1-py3-none-any.whl (54.8 kB view details)

Uploaded Python 3

File details

Details for the file dataverse_sdk-2.5.1.tar.gz.

File metadata

  • Download URL: dataverse_sdk-2.5.1.tar.gz
  • Upload date:
  • Size: 52.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for dataverse_sdk-2.5.1.tar.gz
Algorithm Hash digest
SHA256 b786ec84a3684ed8dc2c625b2dcf9b265b5581a6adaa965fa64ebbfefcaf513f
MD5 4b14b708be69adbe4f4901f424f93701
BLAKE2b-256 af04d07ac3afc10b1a86a088bac7372e866637d501c5710ac7755d0b01beb3df

See more details on using hashes here.

File details

Details for the file dataverse_sdk-2.5.1-py3-none-any.whl.

File metadata

  • Download URL: dataverse_sdk-2.5.1-py3-none-any.whl
  • Upload date:
  • Size: 54.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.19

File hashes

Hashes for dataverse_sdk-2.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 aa3921b19fae56823d84dd8eeb73fee6e36fa9af8975257757afe93b9051ca6d
MD5 ac2197fd990a0a56821fed71080b470f
BLAKE2b-256 8b27af63c9a9f9d508c107fbed5e95972be2aadec3e6a6f63a239003bc32d1d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page